A Fast Algorithm for the Minimum Covariance Determinant Estimator

نویسندگان

  • Peter Rousseeuw
  • Katrien van Driessen
چکیده

The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter. Its objective is to nd h observations (out of n) whose covariance matrix has the lowest determinant. Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size: one about a production process at Philips with n = 677 objects and p = 9 variables, and a data set from astronomy with n =137,256 objects and p = 27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we calìselective iteration' and`nested extensions'. For small data sets FAST-MCD typically nds the exact MCD, whereas for larger data sets it gives more accurate results than existing algorithms and is faster by orders of magnitude. Moreover, FAST-MCD is able to detect an exact t, i.e. a hyperplane containing h or more observations. The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data. We also propose the distance-distance plot (or`D-D plot') which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples. We wish to thank Doug Hawkins and Jos e Agulll o for making their programs available to us. We also want to dedicate special thanks to Gertjan Otten, Frans Van Dommelen en Herman Veraa for giving us access to the Philips data, and to S.C. Odewahn and his research group at the California Institute of Technology for allowing us to analyze their Digitized Palomar data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

Innuence Function and Eeciency of the Minimum Covariance Determinant Scatter Matrix Estimator

The Minimum Covariance Determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric distribution. It is relatively fast to compute and intuitively appealing. In this note we derive its innuence function and compute the asymptotic variances of its elements. A comparison with the one step reweighted MCD and with S-estimators ...

متن کامل

Robustified distance based fuzzy membership function for support vector machine classification

Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the im...

متن کامل

High-breakdown estimation of multivariate mean and covariance with missing observations.

We consider the problem of outliers in incomplete multivariate data when the aim is to estimate a measure of mean and covariance, as is the case, for example, in factor analysis. The ER algorithm of Little and Smith which combines the EM algorithm for missing data and a robust estimation step based on an M-estimator could be used in such a situation. However, the ER algorithm as originally prop...

متن کامل

Nonsingular Robust Covariance Estimation in Multivariate Outlier Detection

Rousseeuw’s minimum covariance determinant (MCD) method is a highly robust estimator of multivariate mean and covariance. In practice, the MCD covariance estimator may be singular. However, a nonsingular covariance estimator is required to calculate the Mahalanobis distance. In order to fix this singular problem, we propose an improved version of the MCD estimator, which is a combination of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Technometrics

دوره 41  شماره 

صفحات  -

تاریخ انتشار 1999